Cross-validation: the illusion of reliable performance estimation

نویسندگان

  • Zoltán Prekopcsák
  • Tamás Henk
  • Csaba Gáspár-Papanek
چکیده

In data mining, we are often faced with the task of estimating model performance from training data. This estimation is supposed to express the expectation of the performance on future, previously unseen data and it is very much needed for business decisions and also for the analyst to compare different models. One of the most widely used performance estimation technique is cross-validation which has more and more misuse in these days. This paper describes common mistakes in using cross-validation that significantly obfuscate the estimations, presents several numerical examples on how misleading the estimation can be, and propose a data mining process for ensuring valid performance esti-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Neural Network Model Based on Support Vector Machine for Conceptual Cost Estimation in Construction Projects

Estimation of the conceptual costs in construction projects can be regarded as an important issue in feasibility studies. This estimation has a major impact on the success of construction projects. Indeed, this estimation supports the required information that can be employed in cost management and budgeting of these projects. The purpose of this paper is to introduce an intelligent model to im...

متن کامل

A New Approach for Determination of Neck-Pore Size Distribution of Porous Membranes via Bubble Point Data

Reliable estimation of the porous membranes neck-pore size distribution (NPSD) is the key element in the design and operation of all membrane separation processes. In this paper, a new approach is presented for reliable of NPSD of porous membranes using wet flow-state bubble point test data. For this purpose, a robust method based on the linear regularization theory is developed to extract NPSD...

متن کامل

Large-scale Inversion of Magnetic Data Using Golub-Kahan Bidiagonalization with Truncated Generalized Cross Validation for Regularization Parameter Estimation

In this paper a fast method for large-scale sparse inversion of magnetic data is considered. The L1-norm stabilizer is used to generate models with sharp and distinct interfaces. To deal with the non-linearity introduced by the L1-norm, a model-space iteratively reweighted least squares algorithm is used. The original model matrix is factorized using the Golub-Kahan bidiagonalization that proje...

متن کامل

Neural Network Ensembles, Cross Validation, and Active Learning

Learning of continuous valued functions using neural network ensembles (committees) can give improved accuracy, reliable estimation of the generalization error, and active learning. The ambiguity is defined as the variation of the output of ensemble members averaged over unlabeled data, so it quantifies the disagreement among the networks. It is discussed how to use the ambiguity in combination...

متن کامل

Fiscal Illusion in Iranian Economy Emphasizing the Five-Dimensional Indicators and the NARDL Approach

The phenomenon of fiscal illusion has always been an intriguing topic in the public finance literature. Fiscal illusion is a concept in which misinterpretation of fiscal parameters and tax expenses and liabilities lead to bias in budgetary decision making at all levels of the government. The current research presents an empirical analysis of the fiscal illusion in the Iranian economy, using fiv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010